Goto

Collaborating Authors

 type hierarchy


Schema Inference for Tabular Data Repositories Using Large Language Models

arXiv.org Artificial Intelligence

Minimally curated tabular data often contain representational inconsistencies across heterogeneous sources, and are accompanied by sparse metadata. Working with such data is intimidating. While prior work has advanced dataset discovery and exploration, schema inference remains difficult when metadata are limited. We present SI-LLM (Schema Inference using Large Language Models), which infers a concise conceptual schema for tabular data using only column headers and cell values. The inferred schema comprises hierarchical entity types, attributes, and inter-type relationships. In extensive evaluation on two datasets from web tables and open data, SI-LLM achieves promising end-to-end results, as well as better or comparable results to state-of-the-art methods at each step. All source code, full prompts, and datasets of SI-LLM are available at https://github.com/PierreWoL/SILLM.


A Graphical Formalism for Commonsense Reasoning with Recipes

arXiv.org Artificial Intelligence

To used for actions and comestibles; Section 3 presents a address this shortcoming, we propose a high-level representation representation of recipes as bipartite graphs; Section 4 considers of recipes as labelled bipartite graphs where the acceptability of recipes; Section 5 presents definitions first subset of nodes denotes the comestibles involved in the for comparing recipes; Section 6 presents definitions for recipe (ingredients, intermediate food items, final products, composition of recipes from subrecipes; Section 7 presents i.e. dishes, and by-products) and the second subset of nodes substitution based on changing the type of nodes; Section 8 denotes actions on those comestibles. The edges reflect the presents substitution based on changing the structure of the (possibly partial) sequence of steps taken in the recipe going graph; Section 9 discusses related literature; and Section 10 from the ingredients to final products.


Mining Wikidata for Name Resources for African Languages

arXiv.org Artificial Intelligence

This work supports further development of language technology for the languages of Africa by providing a Wikidata-derived resource of name lists corresponding to common entity types (person, location, and organization). While we are not the first to mine Wikidata for name lists, our approach emphasizes scalability and replicability and addresses data quality issues for languages that do not use Latin scripts. We produce lists containing approximately 1.9 million names across 28 African languages. We describe the data, the process used to produce it, and its limitations, and provide the software and data for public use. Finally, we discuss the ethical considerations of producing this resource and others of its kind.


Path Ranking with Attention to Type Hierarchies

arXiv.org Artificial Intelligence

The knowledge base completion problem is the problem of inferring missing information from existing facts in knowledge bases. Path-ranking based methods use sequences of relations as general patterns of paths for prediction. However, these patterns usually lack accuracy because they are generic and can often apply to widely varying scenarios. We leverage type hierarchies of entities to create a new class of path patterns that are both discriminative and generalizable. Then we propose an attention-based RNN model, which can be trained end-to-end, to discover the new path patterns most suitable for the data. Experiments conducted on two benchmark knowledge base completion datasets demonstrate that the proposed model outperforms existing methods by a statistically significant margin. Our quantitative analysis of the path patterns shows that they balance between generalization and discrimination.


Path-Based Attention Neural Model for Fine-Grained Entity Typing

AAAI Conferences

Fine-grained entity typing aims to assign entity mentions in the free text with types arranged in a hierarchical structure. It suffers from the label noise in training data generated by distant supervision. Although recent studies use many features to prune wrong label ahead of training, they suffer from error propagation and bring much complexity. In this paper, we propose an end-to-end typing model, called the path-based attention neural model (PAN), to learn a noise-robust performance by leveraging the hierarchical structure of types. Experiments on two data sets demonstrate its effectiveness.


Coarse-to-Fine Inference and Learning for First-Order Probabilistic Models

AAAI Conferences

Coarse-to-fine approaches use sequences of increasingly fine approximations to control the complexity of inference and learning. These techniques are often used in NLP and vision applications. However, no coarse-to-fine inference or learning methods have been developed for general first-order probabilistic domains, where the potential gains are even higher. We present our Coarse-to-Fine Probabilistic Inference (CFPI) framework for general coarse-to-fine inference for first-order probabilistic models, which leverages a given or induced type hierarchy over objects in the domain. Starting by considering the inference problem at the coarsest type level, our approach performs inference at successively finer grains, pruning high- and low-probability atoms before refining. CFPI can be applied with any probabilistic inference method and can be used in both propositional and relational domains. CFPI provides theoretical guarantees on the errors incurred, and these guarantees can be tightened when CFPI is applied to specific inference algorithms. We also show how to learn parameters in a coarse-to-fine manner to maximize the efficiency of CFPI. We evaluate CFPI with the lifted belief propagation algorithm on social network link prediction and biomolecular event prediction tasks. These experiments show CFPI can greatly speed up inference without sacrificing accuracy.


Leveraging Ontologies for Lifted Probabilistic Inference and Learning

AAAI Conferences

Exploiting ontologies for efficient inference is one of the most widely studied topics in knowledge representation and reasoning. The use of ontologies for probabilistic inference, however, is much less developed. A number of algorithms for lifted inference in first-order probabilistic languages have been proposed, but their scalability is limited by the combinatorial explosion in the sets of objects that need to be considered. We propose a coarse-to-fine inference approach that leverages a class hierarchy to combat this problem. Starting at the highest level, our approach performs inference at successively finer grains, pruning low-probability atoms before refining. We provide bounds on the error incurred by this approach relative to full ground inference as a function of the pruning threshold. We also show how to learn parameters in a coarse-to-fine manner to maximize the opportunities for pruning during inference. Experiments on link prediction and biomolecular event prediction tasks show our method can greatly improve the scalability of lifted probabilistic inference.


The Implementation of Arabic Subject Markers in the LKB System

AAAI Conferences

Arabic Subject Markers are interface phenomena (specifically between morphology and syntax). In this paper, I describe them briefly, I give my linguistic analysis within the framework of the Head-Driven Phrase Structure Grammar and I show how I implement them in the LKB system. I show that this system, despite its strength, does not allow for a proper implementation of these units.


Dealing with Metonymic Readings of Named Entities

arXiv.org Artificial Intelligence

The aim of this paper is to propose a method for tagging named entities (NE), using natural language processing techniques. Beyond their literal meaning, named entities are frequently subject to metonymy. We show the limits of current NE type hierarchies and detail a new proposal aiming at dynamically capturing the semantics of entities in context. This model can analyze complex linguistic phenomena like metonymy, which are known to be difficult for natural language processing but crucial for most applications. We present an implementation and some test using the French ESTER corpus and give significant results.